AITopics | assignment step

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 earlier than the standard mini-batch algorithm.

artificial intelligence, centroid, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fixed-sized clusters $k$-Means

Malinen, Mikko I., Fränti, Pasi

arXiv.org Artificial IntelligenceJan-27-2025

We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.16113

Country:

Europe > Finland > North Karelia > Joensuu (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Nested Mini-Batch K-Means

Neural Information Processing SystemsMar-12-2024, 14:28:06 GMT

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t + 1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 earlier than the standard mini-batch algorithm.

algorithm, centroid, premature fine-tuning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Memetic Differential Evolution Methods for Semi-Supervised Clustering

Mansueto, Pierluigi, Schoen, Fabio

arXiv.org Artificial IntelligenceMar-7-2024

In this paper, we deal with semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems where background knowledge is given in the form of instance-level constraints. In particular, we take into account "must-link" and "cannot-link" constraints, each of which indicates if two dataset points should be associated to the same or to a different cluster. The presence of such constraints makes the problem at least as hard as its unsupervised version: it is no more true that each point is associated to its nearest cluster center, thus requiring some modifications in crucial operations, such as the assignment step. In this scenario, we propose a novel memetic strategy based on the Differential Evolution paradigm, directly extending a state-of-the-art framework recently proposed in the unsupervised clustering literature. As far as we know, our contribution represents the first attempt to define a memetic methodology designed to generate a (hopefully) optimal feasible solution for the semi-supervised MSSC problem. The proposal is compared with some state-of-the-art algorithms from the literature on a set of well-known datasets, highlighting its effectiveness and efficiency in finding good quality clustering solutions.

algorithm, assignment step, constraint, (15 more...)

arXiv.org Artificial Intelligence

2403.04322

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)

Add feedback

Wasserstein k-means with sparse simplex projection

Fukunaga, Takumi, Kasai, Hiroyuki

arXiv.org Machine LearningNov-25-2020

This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data by reducing Wasserstein distance computations and exploiting sparse simplex projection. We shrink data samples, centroids, and the ground cost matrix, which leads to considerable reduction of the computations used to solve optimal transport problems without loss of clustering quality. Furthermore, we dynamically reduced the computational complexity by removing lower-valued data samples and harnessing sparse simplex projection while keeping the degradation of clustering quality lower. We designate this proposed algorithm as sparse simplex projection based Wasserstein $k$-means, or SSPW $k$-means. Numerical evaluations conducted with comparison to results obtained using Wasserstein $k$-means algorithm demonstrate the effectiveness of the proposed SSPW $k$-means for real-world datasets

algorithm, data sample, projection, (14 more...)

arXiv.org Machine Learning

2011.12542

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Fast K-Means Clustering with Anderson Acceleration

Zhang, Juyong, Yao, Yuxin, Peng, Yue, Yu, Hao, Deng, Bailin

arXiv.org Machine LearningMay-27-2018

We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd's algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1805.10638

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

FLIC: Fast Linear Iterative Clustering With Active Search

Zhao, Jiaxing (Nankai University) | Ren, Bo (Nankai University ) | Hou, Qibin (Nankai University ) | Cheng, Ming-Ming (Nankai University ) | Rosin, Paul (Cardiff University)

AAAI ConferencesFeb-8-2018

In this paper, we reconsider the clustering problem for image over-segmentation from a new perspective. We propose a novel search algorithm named “active search” which explicitly considers neighboring continuity. Based on this search method, we design a back-and-forth traversal strategy and a "joint" assignment and update step to speed up the algorithm. Compared to earlier works, such as Simple Linear Iterative Clustering (SLIC) and its follow-ups, who use fixed search regions and perform the assignment and the update step separately, our novel scheme reduces the iteration number before convergence, as well as improves boundary sensitivity of the over-segmentation results. Extensive evaluations on the Berkeley segmentation benchmark verify that our method outperforms competing methods under various evaluation metrics. In particular, lowest time cost is reported among existing methods (approximately 30 fps for a 481321 image on a single CPU core). To facilitate the development of over-segmentation, the code will be publicly available.

Add feedback

Nested Mini-Batch K-Means

Newling, James, Fleuret, François

Neural Information Processing SystemsDec-31-2016

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1\% of the empirical minimum 100 times earlier than the standard mini-batch algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Nested Mini-Batch K-Means

Newling, James, Fleuret, François

arXiv.org Machine LearningSep-12-2016

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 times earlier than the standard mini-batch algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1602.02934

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fast K-Means with Accurate Bounds

Newling, James, Fleuret, François

arXiv.org Machine LearningSep-11-2016

We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster. We also propose a general improvement of existing state-of-the-art accelerated exact k-means algorithms through better estimates of the distance bounds used to reduce the number of distance calculations, and get a speedup in 36 of 44 experiments, up to 1.8 times faster. We have conducted experiments with our own implementations of existing methods to ensure homogeneous evaluation of performance, and we show that our implementations perform as well or better than existing available implementations. Finally, we propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1602.02514

Country: